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Abstract 

A standard tool for model selection in a Bayesian framework is the Bayes 
factor which compares the marginal likelihood of the data under two given 
different models. In this paper, we consider the class of hierarchical loglinear 
models for discrete data given under the form of a contingency table with 
multinomial sampling. We assume that the Diaconis-Ylvisaker conjugate prior 
is the prior distribution on the loglinear parameters and the uniform is the 
prior distribution on the space of models. Under these conditions, the Bayes 
factor between two models is a function of their prior and posterior normalizing 
constants. These constants are functions of the hyperparameters (m, a) which 
can be interpreted respectively as marginal counts and the total count of a 
Active contingency table. 

We study the behaviour of the Bayes factor when a tends to zero. In this 
study two mathematical objects play a most important role. They are, first, the 
interior C of the convex hull C of the support of the multinomial distribution 
for a given hierarchical loglinear model together with its faces and second, the 
characteristic function Jq of this convex set C. We show that, when a tends to 
0, if the data lies on a face Fi of Cj, % = 1, 2 of dimension fcj, the Bayes factor 
behaves like a kl ~ k2 . This implies in particular that when the data is in C\ 
and in C%, i.e. when ki equals the dimension of model J«, the sparser model is 
favored, thus confirming the idea of Bayesian regularization. 

In order to identify the faces of C, we need to know its facets. We give two 
new results. First, we identify a category of facets common to all hierarchical 
models for discrete variables, not necessarily binary. Second, we show that 
these facets are the only facets of C when the model is graphical with respect 
to a decomposable graph. 

Keywords: discrete loglinear models, Bayes factor, convex polytope, faces, 
effective degrees of freedom. 
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1 Introduction 



We consider data given under the form of a contingency table representing the classi- 
fication of N individuals according to a finite set of criteria. We assume that the cell 
counts in the contingency table follow a multinomial distribution. We also assume 
that the cell probabilities are modeled according to a hierarchical loglinear model 
(henceforth called hierarchical model). The multinomial distribution for the hierar- 
chical model is a natural exponential family of the general form L(9)~ l exp(9, t)fi(dt) 
where \i is the generating measure and L is its Laplace transform. The Diaconis- 
Ylvisaker [7] conjugate prior has the general form 

J(m,a) -1 L(0)- a exp(ar(0,m))d0 (1) 

where m and a are hyperparameters and I(m, a) is the normalization constant. Mas- 
sam et al. [16] have identified and studied the Diaconis-Ylvisaker conjugate prior for 
the so called baseline constrained loglinear parametrization of the multinomial for 
hierarchical models. This prior is a generalization of the hyper Dirichlet defined by 
Dawid and Lauritzen [5] for graphical models Markov with respect to decomposable 
graphs. Since decomposable graphical models, and more generally graphical models, 
form a subclass of the class of hierarchical models we will call this prior the general- 
ized hyper Dirichlet. The hyper Dirichlet distribution is also used in discrete Bayesian 
networks as a prior for the cell parameters of the multinomial distribution of counts 
for the directed subgraphs formed by each discrete variable and its parents (see [T2]). 
For the generalized hyper Dirichlet or the hyper Dirichlet, a is a positive scalar while 
m is a vector. The scalar a can be interpreted as the total sample size of a Active 
contingency table and m can be interpreted as the vector of various marginal counts 
of the same table. It is therefore traditional to take a small relatively to the total data 
count N. In this paper, we will use the loglinear parametrization for the hierarchical 
model and the generalized hyper Dirichlet as the prior, as defined in [T6] . 

In a Bayesian framework, the Bayes factor is one of the main tool for model 
selection in the class of hierarchical models. The aim of this paper is to study the 
behaviour of the Bayes factor for the comparison of two hierarchical models J\ and 
J2 when a is very small, i.e., when a — » 0. The motivation for this study is two- 
fold. First, it has been observed that as a — > 0, in general, the Bayes factor will 
select the sparser model, that is the model with the parameter space of smallest 
dimension or equivalently the model with the least number of interactions. This is 
commonly called the phenomenon of regularization, Second, Steck and Jaakkola (20] , 
Proposition 1, have shown that, however, this is not always the case and that, in fact, 
the behaviour of the Bayes factor between two Bayesian networks differing by one edge 
only depends upon a quantity which they call d EDF , effective degrees of freedom, 
and which depends solely on the data. Comparing two such Bayesian networks is 
equivalent to comparing two graphical models on three variables, the saturated model 
and the model Markov with respect to the graph A3, i.e., the two-link chain, with 
one conditional independence. It is therefore natural to seek a generalization of the 
results in [20] when two arbitrary hierarchical models are considered. 
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Our aim is to formally explain when the sparser model is selected, when it is not 
and why. We also want to develop tools to predict what the behaviour of the Bayes 
factor will be, for two given models. 

Since in the case of the Diaconis-Ylvisaker conjugate prior, the posterior prob- 
ability of model J given the data is equal to the ratio of the posterior and prior 
normalizing constants, we will be led to study the asymptotic behaviour, as a — > 0, 
of the normalizing constant I(m,a) in ([I]). In this study, two important mathemati- 
cal objects will surface. The multinomial distribution for a given hierarchical model 
J is a natural exponential family. We denote by C the interior of the convex hull 
C of the support of the measure generating this multinomial distribution. The con- 
vex polytope C, together with its faces, is the first important mathematical object. 
The position of the data with respect to C, that is whether the data is in C or on 
one of the faces of C, will determine the behaviour of the Bayes factor. The second 
important mathematical object is the characteristic function Jq of this polytope C; 
Jc(Tn) is defined in the literature as the volume of the polar set of C — m (see [3]). 
It is through Jc that we will be able to find the asymptotic behaviour of I(m,a). 
Our central statistical result is that, as a — > 0, the Bayes factor B\ j2 between two 
hierarchical models J\ and J 2 behaves as follows: 

B 1>2 ~ Da kl ~ k2 (2) 

where D is a positive constant and hi, i — 1, 2 are, respectively, the dimension of the 
face of Cj containing the data in its relative interior. When the data is in both the 
open convex sets Cj, i — 1, 2, we have of course that 

B x ,i ~ Da lJlHJ21 

and this explains that in general the Bayes factor favours the sparser model since, in 
general for low-dimensional tables, the data is in the open polytope Cj. However with 
modern genetic or sociological data, we often deal with very sparse high-dimensional 
tables. In that case, the data may well be on a face of dimension hi < |Jj|. Then, as 
shown in [20] for three-factor models, the sparser model is not necessarily favoured 
by the Bayes factor. We do not consider, in this paper, the case a — > +oo since in 
that case, the behaviour of I(m,a) is well-known (see for example [19] or [TT]). 

Here is a detailed description of the content of this paper. As mentioned above, we 
assume a multinomial distribution for the counts and the generalized hyper Dirichlet 
prior for the loglinear parameters, as given in [16]. However, in order to efficiently 
describe the geometry of C, we simplify the notation in [16J. This new notation is 
given in Section 2.1. In Section 2.2, we derive a characterization of the hierarchical 
model which is close to that given in Proposition 3.1 of Darroch and Speed [3]. Though 
not central to the derivation of our main result, this characterization strengthens our 
understanding of the hierarchical model. In Section 2.3, we summarize §2 in [T6] and 
give a precise description of the measure \x generating the multinomial distribution, 
of its support and of the interior C of the convex hull C of this support. In Section 
3, we describe properties of Jc(rn) and I(m,a). Theorems 13.11 13.21 and 13.31 give. 
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respectively, the expression of Jc( m ) in terms of the affine forms defining the facets 
of C, the behaviour of I(m, a) when m G C and a — >■ 0, and the behaviour of Jc( z ) as 
z tends to the boundary of C along a straight line. These results are used in Section 
4 where we give our main statistical result, Theorem 14.11 which yields equation (j2]). 
We thus have a precise description of the behaviour of the Bayes factor depending on 
the position of the data on the convex polytope C», i — 1, 2 of the two models being 
compared. We show in Section 14.31 that our results comprise the results in |20j as a 
particular case. In fact, we give a generalization of the concept of effective degrees 
of freedom to allow for the comparison of two arbitrary decomposable models. In 
Proposition 14.21 using the generalized effective degrees of freedom, we give a quick 
and easy way to predict the behaviour of the Bayes factor. Since faces of C can 
only be obtained through the facets of C, in Section 5, we return to the geometry 
of C and its facets. This set of facets has already been studied in the literature for 
certain binary hierarchical models (e.g. [5] or [13]). In Theorem 15.11 we describe a 
category of facets common to all hierarchical models. This constitutes a new result. 
For example, for the hierarchical model with four vertices {a, b, c, d} and all three-way 
interactions (abc), (bed), (cda), (dab), no facets of C were known. In Corollary 15.11 we 
show that the special category of facets given in Theorem 15.11 actually gives all the 
facets of C when the model is decomposable. We conjecture that this characterizes 
decomposable graphical models. Finally in Section 5.3, for the convenience of the 
reader, we present some known results about the facets of C for graphical models 
Markov with respect to a cycle, using the notations of the present paper. 

2 Preliminaries 
2.1 The notation 

While we keep the traditional notation as given in [B] for cells and cell counts of the 
contingency table, we simplify the notation introduced in [16] for the set of nonzero 
loglinear parameters. 

Let V be a finite set of indices representing \V\ criteria. We assume that the 
criterion labelled by v G V can take values in a finite set I v . We consider N individuals 
classified according to these \V\ criteria. The resulting counts are gathered in a 
contingency table such that 

is the set of cells i = (i v , v G V). If D C V and i G / we write ip = (i v ,v G D) 
for the D-marginal cell. We write R 1 for the space of real functions i t- y x(i) defined 
on /. The element x G R 1 is seen sometimes as a vector, sometimes as the function 
i i — y x(i) on /. 

Let D be a family of non empty subsets of V such that D G T>, D\ C D and 
D\ ^ implies D\ G T>. In order to avoid trivialities we assume Ud^vD = V. In the 
literature such a family T> is called a hypergraph (see [15]) or an abstract simplicial 
complex (see [9]) or more simply the generating class (see [8]). Following the notation 
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introduced in [1] , we denote by VLx> the linear subspace of x G R 1 such that there exist 
functions Ad G R 1 for D G V depending only on and such that x = J^Dev^D, 
that is 

&v = {x E R 1 : 3\ D G R 1 , D G X> such that Ad(«) = ^d^d) and x = Ad} 

Dev 

The hierarchical model generated by T> is the set of probabilities p = {p{i))iei on I 
such that p(z) > for all i and such that logp G VL V . It is convenient to write for p in 

logp(i) = A + Mi) (3) 
Dev 

where \$ does not depend on i and is thus a constant. Needless to say the represen- 
tation (j3J) is not unique. 

We now introduce the notions we will need later to express the baseline constrained 
loglinear parameters used in the present paper. We first select a special element in 
each I v . For convenience we denote it 0. By abuse of notation, we also denote in / 
the cell with all its components equal to 0. This special element in I v is denoted r v in 
[I] and i* in [16] , but we find the notation more convenient. Actually the choice of 
the special element in each I v is arbitrary and does not affect our results. If % G I 
the support of i is the subset of V defined as 

S(i) = {v G V ; i v ^ 0}. 

We write 

J = {jel, S(j)eV} (4) 

and note that since T> does not contain the empty set, J does not contain G /. This 
set J G I is essential here and de facto defines the hierarchical model. We introduce 
the important notation 

j<i 

for i E I and j G J to mean that S(j) is contained in S(i) and that js(j) = is(j)- Note 
that if j,j' G J and i G / we have 

j < f and j' <i => j < i. (5) 

Thus < is in particular a partial ordering for J but we will never use the notation 
% < i! for % or i' in I \ J . Let us illustrate the notation above with an example. Let 
V = {a, b, c}, V = {a, b, c, ab, be} and I a = {0, 1, 2} = I b and I c = {0, 1}. Thus / has 
3 x 3 x 2 = 18 elements and 

J = {100, 200, 010, 020, 001, 110, 210, 120, 220, 011, 021} 

with 11 elements with respective supports a, a, b, b, c, ab, ab, ab, ab, ac, ac. If i = 201 the 
set of j in J such that j<ti is {200, 001} and if i = 211 this set is {210, 200, 011, 001, 010}. 
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2.2 The hierarchical model 

We now introduce vectors which are fundamental for the description of the geometry of 
our problem. Let (gi)iei and {&j)jeJ be the canonical basis of R 1 and R J respectively. 
We endow these two spaces with their natural Euclidean structure. For alH G I we 
define /j G R J by 

fi= £ ^ (6) 

with in particular / = 0. Let H be the linear application H : if 7 — >■ if J which sends 
the vector x = x {i)9i of -R 7 into if(x) = Y.i&i x ii)fi of i? J . Then 

#(*) = £<*£*«■ (7) 

The adjoint of if is the linear application if* : i? J — >■ i? 7 such that for all 9 = 
&j e j £ -R J an d all x E R 1 one has 

0) = (E e ,E^),E^ e i) = £*(<)£** = ^h*{6)). 

As a consequence 

= J> E ( 8 ) 

In other words the vector H*(9) of i? 7 is the function 2 i-> YljeJ, j<&®r For instance 
for j G J one has 

#*(e;) = E fc- 

Suppose that x G -R 7 is in the image of H*. The expression of # G i? J for a given 
x = H*{9) G i? 7 is given in the following lemma. 

Lemma 2.1 The mapping H* : R J 1— )■ i? 7 defined above is injective. If x is in the 
image im(H*) of H* we have x = H*{9) if and only if for all j £ J 

e 3 = E (-i)' 5 °- )H5(/) '*(j"). (9) 

j'eJ ; j'<y 

in particular the vectors (H*(ej))j £ j are a basis of im(H*). 

Proof: Let us first show the expression (Q of 0j, j G J. It follows from (jSj) that 
x (f) = Hj<j'Qj an d therefore © is equivalent to 0j = ^-'<j( — 1)' 
which is equivalent to 

fi = £fi« E (-i) |flWHWI - 

j"<i j"<j'<j 

We therefore have to prove that for fixed 7" and j in J such that j" < j' < j, 

V (--i)\sU)\-\s{j')\ _ / 1 ii J=f 

;h, I if./ .-' ./". 

3 <3 <3 > J 1 J 



If j = j" the result is trivially true. If j 7^ j" and j" < f < j with j fixed, then j' is 
entirely determined by its support S(j) since j'gui) = js(j')- The principle of inclusion 
exclusion says that for A C C where C^Awe have J2acbcc(~ 1)' B ' = 0- Applying 
this to A = S(j") and C = S(J) gives the desired result and ([9]) is proved. Note that 
(jUJ) implies that if* is injective and that the vectors (H*(ej))j £ j are independent. □ 

The following proposition characterizes T> in terms of the linear application H and 
its adjoint H* . Its corollary describes the hierarchical model. 

Proposition 2.1 The space Qx> defined in (TJjj is the direct sum of the 1- dimensional 
space K of constants and the image of H* in R J 

Q v = K®im{H*). 

The dimension of Qd is 

d v = l + \J\ with | J| = II d-M - X )- 

DeVv&D 

Proof: We now show K © im(H*) = Q-p. To see this, let us consider for D £ V 
the linear space E D of functions i Xoii) defined on / and depending only on i D . 
This space is isomorphic to R Id with the notation I D = YiveD Iv an d therefore has 
dimension — l\ veD \I V \. 

We now prove that a basis of the linear space Ed is given by a vector generating 
the space K of constants and by the set of the \Ip \ — 1 vectors of R 1 

{H*( ej ) j e J, SO') c D}. 

To see this we observe that from the definition of H* the value of the function 
i i — y H*(ej)(i) is equal to 1 if j<i and to if not. If furthermore S(j) C D this function 
H*(e.j) is an element of Ejj. This is checked by writing i — (ip, «d c ) : we have to show 
that H*(ej)(i) does not depend on 2£> c - Consider the case H*(ej)(iD,iD c ) = 1- This 
is saying that j < (in, in c ) which implies that S(j) C S(i) and js(j) = is(j)- Recall our 
hypothesis S(j) C D. Now consider %' = (iz),i' Dc ). Clearly j < (io,iD c ) if and only if 
j < (i D ,i' DC ), and this implies that H*(ej)(i) = H*(ej)(i'). Thus H*(ej) G E D when 
S(j) C -D. Now we recall that the vectors {H*(ej) j G J, S(J) C Z^} are independent 
and that K PI im(if*) = {0}. Since the dimension of is (|-Td| — 1) + 1 the claim is 
proved. 

To complete the proof we use the fact that = Y^d^v^d (not a direct sum). 
Therefore fl-jy — K © im(if*). Note that im(if*) can be seen as the subspace of the 
x G fi© such that a;(0) = 0. The basis if im(if*) being the set of vectors {H*(ej) S(j) = 
D, D G V} it then becomes clear that 

1^1= E Il(^-i)- 

DeVveD 

The proposition is proved. □ 
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Corollary 2.1 The probability p = (p(i),i G I) belongs the hierarchical model gen- 
erated by V if and only if there exist 9 = J2jeJ^j e j e an & a rea ^ number 9q such 
that 

\ogp = 9 + H*(9) (10) 

that is, for all i G I, 

logp(i) =e + °r 

jeJ jot 

Moreover, 9 G R J is uniquely defined by 

6j= £ (-l)l^)H^')llogp(j') 
i'eJ ; 

and 9 is uniquely defined by e~ e ° = L{9), where 

e- 6 ° = L{9) = 1 + £ exp( £ (11) 
«e/\{o} j'eJ 3<i 

JTie discrete hierarchical model generated by V is a manifold of dimension \J\. 

The results of the corollary above are not new of course. The characterization of the 
hierarchical model is close to that given in Proposition 3.1 of [I] when one chooses 
what is called in that paper the substitution weight function for the averaging oper- 
ator. 



2.3 The multinomial distribution as a natural exponential 
family 

We will now use the proposition above to express the density of the multinomial dis- 
tribution for the hierarchical model generated by T>. We consider a contingency table 
with cells i = (i v ,v G V) G / and cell counts n = (n(i), i G I) with YUeJ 71 ^) = N 
obtained from N i.i.d. observations of a multivariate Bernoulli variable with param- 
eter (p(i), i G I), i.e. with distribution J2iezP(i>)dgi- For E C V we write G Ie and 
Ji(*e) = J2i'ei;i E =i' E n (i') f° r the .E-marginal cell and i?-marginal count respectively. 
For the particular case E = S(j), j G J we abbreviate n(js(j)) as 

t(j) = n(j S (j)). (12) 
Then, using (HDD, © and © we have 

5>(z)logp(z) = (\ogp,n) Rl =N9 + (H*(6),n) Rl = N9 + (9,H(n)) RJ 

= N9 + (9,t) RJ = N9 + J2t(j)9 
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which, using (fn]) we rewrite 



LW) nW = j^w exp fZ*(Mj = ex P ( E *U)0j + ^oj • (13) 
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The multinomial distribution for the model generated by T> is therefore a natural 
exponential family on R J and is generated by a discrete measure on R J whose Laplace 
transform is L{6) N . For /j as defined in (jSJ), we have L{9) = J2iei e^'^ and therefore 
L is the Laplace transform of the counting measure 

A* = E*/< ( 14 ) 

iei 

on the set of vectors (fi)iei- This exponential family is concentrated on a bounded 
set of R J and therefore the set of parameters 9 for which L is finite is the whole space 
R J . Hence the family is regular in the sense of Barndorff- Nielsen [2] and Diaconis 
and Ylvisaker [TJ. Let C C R J be the interior of the convex hull of the set In 
Corollary 12.21 below we show that the (fi)'s are the extreme points of its closure C. 

Proposition 2.2 Let (ej) je j be the canonical basis of R J , Zet (Jj)ier be a family of 
subsets of J such that Uj e j Jj = J and Zei /* = X^gji e i ; i 6 The extreme points of 
the convex hull C of the vectors (fi)iei are the vectors (/j)iei themselves. 

Corollary 2.2 The extreme points of the convex hull of the support of the measure 
[L as defined in JT^] ) are the /j,z £ I as defined in (Tj|). 

The corollary is obtained by taking Jj = {j G J ; j < i] . Let us prove Proposition 

E2J 

Proof: Trivially any extreme point of C is an fi for some %. Conversely let us show 
that for some given i , f io is an extreme point of C. Suppose that there exist non 
negative numbers (Aj)j 6 j such that Z)iei A» = 1 and f io = \ fi- We are going to 
show that necessarily Aj = if % ^ iq. By the definition of the fi 

k = E e i = E(a* E ci). (is) 

We observe first that if A, > then Jj C Jj . If not there exists a j ^ ^? \ ^ an d 
therefore (fi ,ej ) = 0. But ( TT51) contradicts this since (fi ,ej Q ) > Aj > 0. Therefore, 
writing 

A(i ) = G /; Jj C JJ, 
we must have Aj = if i g" Aj and J2ieA(i ) Aj = 1. Then (TT5]) becomes 

o = ( E ^(E e i)- E (^E e i) 
= E E A 0-( E ^)] 
= E^( E ^) 

Since the e/s are independent, it follows that SieAfio), igJi Aj = and since the A's 
are nonnegative, this will imply Aj = for each % ^ i ,i e A(io) if we can show that 
there exists a jo such that jo £ J« \ ^i- This clearly true and therefore we conclude 
that Aj = if i ^ iq. This proves that fi is an extreme point. □ 
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2.4 The DY conjugate prior for the loglinear parameters 



As seen in Corollary 12 .![ the hierarchical model generated by T> can also be character- 
ized by the set J as defined in Moreover, from (fT3j) . we see that the multinomial 
distribution of n for the model J can be written in terms of the marginal counts 
tj = G J). It is the natural exponential family with density, with respect to 

H® N , equal to 

fmJ)=C(nf-?0^ (16) 

where /i is as in ( 114)) and C(n) is constant with respect to 9. Following [7J, the DY 
conjugate prior for 9 indexed by a > and by mj 6 C is defined as the probability 
on R J with density 

I e a(6,mj) 

n(9\mj, a, J) = ——. x — — - — 

where Ij(m,a) is the normalizing constant. This family of priors is conjugate and 
the posterior probability of 9 given the data n = (n(i))j e j in the contingency table is 

amj + tj 

Let H denote the set of all hierarchical models on the given set of variables. If we 
assume that the prior distribution on H is discrete, then the posterior distribution of 
J given the data is 

g(J\t) = C(n) 1 gfcg . -/ V 1 

V ' ^ V ^ Ij{mj,a) I f^ u I L {m L ,a) 

In classical Bayesian model selection, the most probable models are selected by means 
of Bayes factors. More precisely, models are compared two by two by means of the 
Bayes factor between model J± and model J2. In our framework 

_ Z 2 (m 2 ,a) /igggSo + iV) 
1,2 h(m u a) h(^,a + N) [ <} 

where, for the sake of simplicity, m, t, I are indexed by i = 1, 2 rather than by Ji, J 2 
and where mi and m 2 have been chosen in C\ and C 2 respectively. The aim of the 
present paper is to find the limit of when a — > 0. If we assume that n(z) > for 
all i 6 I, then tk/N is in the interior of and under these circumstances the second 
factor in the right-hand side of (ITT]) has the finite limit N). For the 

first factor in (jTTj) . we will show that I(m,a) ~ Q ^o Jc( m ) a ~^ where Jc( m ) will be 
studied in the next section. Thus when a — > the Bayes factor is equivalent to 

a .ui 1-1*1 x Mj^ 

If we do not assume that n(i) > for all i 6 I, then t^/iV might be on the boundary 
of Ck for at least one k = 1 , 2 and we will have to further study the behaviour of 
I(m,a) and Jc(m). This is done in the following section. 
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3 The limiting behavior of the prior normalizing 
constant 



We give three fundamental theoretical results in this section. We assume that m is 
in the interior of C, the convex hull of the measure /i as defined in (|14p . Theorem 13. II 
gives the general form of Jci'm) in terms of the affine forms defining the facets of C. 
Theorem 13.21 gives the limit of I(m, a) when a — > and Theorem 13.31 describes the 
behaviour of Jc((l ~ tyy) when y is on the boundary of C and A — > 0. 



3.1 The characteristic function of a convex set 

Given a finite dimensional real linear space E, let E* be its dual, that is, the space 
of all linear forms 6 on E. We write (6, x) instead of 9{x) when (8,x) G E* x E. 
We fix a Lebesgue measure d6 on E* and a Lebesgue measure dx on E which must 
be compatible (this means that if e is a basis of E and e* is the corresponding dual 
basis of E* the product of the respective volumes of the two cubes built on e and e* 
must be one). Needless to say when E = R n and E* = E and (., .) is the usual inner 
product and the Lebesgue measure is the usual one. It will be however important in 
the sequel to distinguish between E and E* and we therefore keep this notation. 
If C C E is an open non empty convex set not containing a line, its polar set is 

C° = {6eE* ; (6,x) < 1 Vx G C}, 

its support function h c : E* — > (— oo, oo] is 

hc(9) = sup{(#,x) ; x G C} 

and its characteristic function is the function m i— > Jc{m) defined on C by 

J c {m) = [ e {e ' m) - hc{e) d6. (18) 



E* 

We note that if C contained a line, we would have hc{0) = oo almost everywhere and 
J c = 0. Faraut and Koranyi [10], p. 10 define Jq when C is an open convex salient 
cone. In that case, the polar set of C is the convex cone 

C° = {6 G E* ; (6, x) < Vx G C} (19) 

and we have 

wm JO if#GC° 

Let us mention here that when C is a bounded set, hc(0) is finite for all 9 G E*. We 
also have the following important property of Jc{-)- 

Lemma 3.1 Let C be an open convex set not containing a line and let m G C . Then 
Jc(m) is finite. 
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Proof: We first give the proof for m = 0. Then, by assumption, is an interior point 
of C and he is always a strictly positive function. Assume that E has a Euclidean 
structure and let S(E) be its unit sphere. Recall that the function 9 i— > he (9) is a 
continuous function. Thus, for u — 9/\\9\\ G S(E) we have the equality 

h c {9)/\\9\\ = h c (u) = max{(u,x) ; x G C}. 

Now the function k h- hc(u) is continuous on the compact set S(E): let K > be 
its minimum. The previous equality shows that 

K\\9\\ < h c {9). 

Thus if n = dim E we have 

/ e- hc{e) d9<[ e~ Km d9 = C n r e- Kr r n ~ l dr <oo 
Je* Je* Jo 

where C n = 2n n / 2 /T(n/2) is the area of S(E). For the general case m/Owe use the 
fact that the support function of C — m satisfies hc~ m {9) = —(8,m) + hc(9). □ 

One can prove that Jc{m) = oo if m ^ C. Another property of Jc{m) is that 
when C is an open convex set of R n not containing a line, the following formulas hold 

r d9 

J cH = n !Vol(C- m r = ,!/ cj (1 _ Wm)) „ +1 (20) 

For the first equality in ([20]) . see [3] p. 207 and [T] p. 243. For the second one, make 
the change of variable 9 = 9' /{I + (9',m)) in the integral jrc-m)° dQ' ' ■ 

Computing Jc{m) when C is associated to an arbitrary hierarchical model is 
usually difficult except as we shall see in Section 15.21 when the model is a graphical 
decomposable model. Consider however the following simple example: 
Example 1: the segment (0, 1). Let C = (0, 1) C R. In this case, h c {9) = max(0,6 l ) 
and for < m < 1 we have 

e 6m dd + / e 0m-0 dg = _ + = ^_ ^ 

-oo Jo m 1 — m m(l — m) 

Two more examples of Jc{ m ) will be given after Theorem 13.21 below. 

We now give a theorem that states that Jc( m ) is the ratio of polynomials where 
the denominator is equal to the product of the affine forms defining the facets of C. 
This will be used in Section 5 to identify the facets of C for decomposable graphical 
models. We first need the following lemma which computes the characteristic function 
of a simplicial cone. 

Lemma 3.2 Let (xi, . . . , x n ) a basis of E and let . . . , £ n ) be its dual basis in E* 
(that is (£j,Xi) = 8j). Consider the simplicial cone A of E* defined by 

A = {0 = 0i£i + --- + 0„£„; 9 X >0,...A>0} 
= {9eE*- (9,x 1 )>0 } ...,(9,x n )>0} 
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and denote by Vol(£i, . . . , £„,) the volume of the parallelotope 



{e = + ■■■ + e n i n ; o < e 1 < 1 



...,o<^<i}. 



Then for all x in —A° C E, that is the opposite of the dual cone of A, we have 



This lemma is elementary and is obtained by writing 9 in the £ basis and by making 
the change of variable from the coordinates of 9 in the canonical basis of R n to the 
coordinates in the £ basis. 

Recall that a facet of a polytope C C R n with a non empty interior is a face of 
dimension n — 1. More specifically a facet is the intersection of C with a supporting 
hyperplane of C which contains n affmely independent points. 

Theorem 3.1 Let C C E be the non empty interior of a bounded polytope C . Let 
m G C. Then we have 



where D{m) = Y[k=i9k( m ) i g the product of affine forms Qkijn) in m such that 
gk(rn) = 0, k = 1, . . . , K define the facets of C and where N{m) is a polynomial 
of degree < K. 

Proof: Let 8 be the set of extreme points of C. By Corollary 12. 2\ we know that C is 
the convex hull of S. Therefore for each 9, there exists at least one / £ £ such that 
hc(9) = (9, f). Define the cone of influence of / G £ to be 



The cone of influence may be better visualized through its polar cone A°(f) which is 
contained in E and is generated by C — /. In other words, / + A°(f) is the support 
cone of C at its vertex /. 

We now split E* into the union of A(f), f G £ whose interiors are disjoint and 
intersections have measure zero. Indeed, for fa G £,i= 1,2, 



and therefore taking successively g = fi and g = f 2 , we see that A(fi) D A(f 2 ) = {9 : 
(9, fi — f 2 ) = 0} which is of measure 0. Therefore, if we write 




A(f) = {9e E*; (9, x) < (9, f) Wx G C} = {9 G E*; h c (9) = (9, /)}. 



A(A) n A(f 2 ) = {9 : (9, h-g)>0 and (9, f 2 -g)> 0, V<? G £} 




we have 



fee 
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In order to compute If(m), we now split A(f) into a union of closed simplicial 
cones A\ (/),..., Ajv>(/) with n generators each, with disjoint interiors such that 
U^A^/) = A(f) and such that each generator of an Aj(f) is a generator of the 
cone A(f). Each A/(/) is the intersection of n half spaces given by 

{9 G E*\ (9,x?)>0}, z = l,...,n 

for some vector x$ of E which is therefore an extreme generator of A(f). Since the 
Aj(f) are proper cones of R n , (x^, . . . ,xffl) defines a basis of E. The vector / 
can be represented in this basis as 

n 

/ m y+ff: -nihrr- 
i=i 

From Lemma [3.21 



m 



Urn) = V / e-M-^dO = V V ^ / •••'T/ (22) 

where \ ■ ■ ■ > £n^) * s the dual basis of (o^ , . . . , x^) and 

/, W -m? ) = <e? ) ,/-m>. 
Reducing to the same denominator, we obtain that 

I,[m) = nSfeJ^) 

where the ^ G -E* are taken among the $\ . . . , £^ with j = 1, . . . , iVy and where Pf 
is a polynomial in m with total degree < M < nNf. Note that for fixed (EE* the 
hyperplane of E defined by 

H(f,C) = {meE- (C,/-m> = 0} 

contains the extreme point /. If (7 is another extreme point of C and if (Q, f) = (Q, g) 
then H(f, Q = H(g, (). This means that several factors of the denominator of If{m) 
can also occur in I g (m) and therefore 

/ x N(m) , s 

Jc{m) = K 1 (23) 

rLfc=i#fc(ra) 

where m i— > g^(m), k = 1, . . . , K are distinct affine forms taken in the list of the 
Skim) — (Ci> / — m ) when i and the extreme point / vary, and where N(m) is a 
polynomial in m. Since, as a Laplace transform, Jc is analytic in C there is no point 
m in C such that gk{fn) = 0. Therefore all facets must be of the form Cn{Q, f—m) = 
for some i G {1, . . . , K}. Conversely every face C fl (Q, f — m) = is a facet: this is 
proved in the following general lemma. 
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Lemma 3.3 Let E be a n dimensional space and let Ci> • • • >Gv generating the dual 
space E* , such that there exists u G E with ((j,u) > for all j and such that R+(j is 
an extreme ray of the convex cone A = Y,k=i R+Ck for all j — 1, ... , N. If B = A° C E 
is the dual cone of A then, for all j = 1, . . . , N, 

BH {x G E ; (Q,x) = 0} 

is a facet of B . 

Proof: Consider the hyperplane of E* defined by P = ; ((,u) = 1}. Without loss 
of generality we assume Q G P for all j — 1, . . . , N. Thus Ci, • • • , (n are the extreme 
points of the polytope S = A fl P. This polytope can be defined as the intersection 
of a finite number of n — 1-dimensional half spaces HP, k = 1, . . . , K where 
Hk = {( £ E* ; ((,Xk) > 0} is a half-space in E* determined by Xk G E. Moreover, 
any particular extreme point (j of S is a face of 5* of dimension and is therefore the 
intersection of fl P,k G / where I C {1, . . . , K} is of cardinality at least n — 1. 
This is equivalent to saying that the linear system in n — 1 unknown variables 

{(e^nP; (C,z fc ) = 0, A;G/} 

has a s a unique solution and we can therefore find n — 1 vectors (xk^Zi which 
are independent. Since 5 is the intersection of the half planes {x : (x,(j) > 0}, j = 
1, . . . , N and since any ( e A can be written as a convex combination of = 
1, . . . , N, the vectors (xk^Zi are in 5 fl {i G £ ; (Cj, x) = 0} and therefore define a 
facet of -B. This completes the proof of Lemma [3 .31 □ The proof of Theorem 13.11 is 

also completed. □ 



3.2 The behaviour of 7(m,a) as a — > 
We have the following theorem. 

Theorem 3.2 Let \i be a positive measure on the finite dimensional linear space E 
such that the interior C of its closed convex support is not empty and is bounded. 
Denote by L(9) = f E e^ 9 '^ /j(dx) its Laplace transform. For m G C and for a > 
consider the Diaconis Ylvisaker integral 

„ e a(6,m) 

I{m > a)= Liw de - 

Then 

lim a n I(m, a) = J c (m) (24) 

a— >0 

where n = dim E. 

Let us note immediately that a remarkable feature of this result is that the limit 
Jcijn) of a n I(m,a) depends on /z only through its convex support. For instance if 
E = R, the uniform measure on (0, 1) and the sum ji = 5 + 5\ of two Dirac measures 
share the same C = (0, 1) and the same Jc{ m ) — {m(l — m)) -1 . We now need the 
following lemma. 
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Lemma 3.4 Let fi be a bounded measure on some measurable space and let f be 
a positive, bounded and measurable function on fl. Then we have 

i- ||/Hp-Vh»||/||oo 

2. The function p i— > \\f\\ p is either decreasing on (0, oo) or there exists po > 
such that it is decreasing on (0,po] an d increasing on [p , +oo). 

Proof: Part 1 is well-known. Let us prove Part 2. Let v be the image measure of fi 
by / log /, then 

11/11. = ( e^v{dx)) l ' P = ( r° ^{dx)) 1 ^' 

v J — oo ' J — oo ' 

= exp (-k Ul (p) + - log c) 
\p p ' 

where ^ = v\ is a probability measure. Now we can easily verify through integration 
by parts that k Vl (p) = Jq(p — t)k'{, (t)dt and therefore 

-km = r{i--)k'^(t)dt 

p Jo p 

d , 1 , , „ 1 f p , » , s , 
— p = - / tk Vl {t)dt 
dp p p Jo 

±(h„M + -logC) = -i(r<(t)dt-logC) 
ctp p p p Jo 

Now since k is strictly convex, the fonction p h-> /i(p) = /q 'tk'l (t)dt — logC is contin- 
uous and increasing. Therefore either ft is negative for all p or ft is negative until its 
unique zero po an d then it is positive. This proves the lemma. □ 

Proof: (of Theorem 13.21 ) In the integral a n I(m, a) we make the change of variable 
y = ad and we obtain 

r e (y,m) 

a n I(m,a)= / — — -—dy. 
K J Je* L{y/a) a y 

We now apply the last lemma to fl, — C, to the bounded measure p, to the function 
f(x) = e^'^ for some fixed y E E* and to p = l/a. Denote by S the support of \i. 
One easily sees that the support function of C satisfies 

hc{9) — su p{(#; x) ; x E C} = max{(#, x) ; x E S} 

since C is the interior of the convex hull of S. As a consequence the essential sup of 
/ is e hc ^ and we get 

lim L(y/a) a = e hc(y) . 

Furthermore, by Lemma [3.41 the function p h-> ||/|| p is monotonic for p big enough. 
If p | \f\ \ p is increasing, ,,}, , is decreasing and then by the monotone convergence 
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theorem 



*™J E *L(ry; dy = J E . \^ L{lY dV = Je* lim^ \\e^\\ p dy 



dy [ e {y ' m) - hc(y) dy = J c {m) 



e {y,m) 

I \p(y^ m \ I 
M 1 1 1 

If p i — >■ ||/|| p is decreasing, p !->■ l/||/|| p is increasing. In order to show that we 
can invert the order of limit and integration and to apply the monotone convergence 
theorem as we did in the previous case, we need here insure that f E * e^ y,n ^~ hc ^ y 'dy is 
finite: Lemma [3.11 shows that it is true. The proof is complete. □ We now give two 

more examples of functions Jc{ m ) which we compute using Theorem 13.21 
Example 2: C is the simplex. Let eo = and (ei, . . . , e n ) be the canonical basis of 
R n . Let C be the interior of the convex set generated by e , . . . ,e n . Then C is the 
set of m e R n such that m = £™=o ^j e j f° r some unique positive A , . . . , A n satisfying 
Ai + h A n < 1. In this case 

Jc{m) = — — -. 

m\m2 . . . m n (l — m\ — ■ ■ ■ — m n ) 

This result can be obtained by computing J(m, a) for p(x) in ( !T4"1) equal to p = 
5 eo + Yh=i $ei- Using elementary methods of integration, we find that 

lm ' a) J R « (1 + £?=i e^Y (1 + £?=i \k r(Er=o ami) 

where m = 1 — J27=i m i- Using zT(z) = T(l + z) -^- z ^ = 1 we immediately obtain 
that 

Jcim) = lim a n I(m, a) = — . 

Example 3: C for the graphical model • — • — For simplicity, we will assume that 
the variables a,b,c are binary so that m = {rrij,j G J) where J is defined as in (jSJ) 
can be written m = (m D , D EV) where T> = {a, b, c, ab, be}. We shall generalize this 
example in Section 5. From formula (4.8) in [16], we know that 

I(m,a) = r(a(l-m a -m b + m ab ))T(a(m a -m ab ))r(a(m b -m ab )) 

xT(a(m ab ))T(a(l -m b -m c + m bc ))T(a(m b - m bc ))T(a(m c - m bc ))T (a(m bc )) 
1 

T(am b )T(a(l — m b )) 
and therefore using zT(z) = T(l + z) — > z ^o 1 again we obtain that 

,• 5r/ \ T ( x m b (l-m b ) 

lima lim, a) = Jc\ m ) = 

m ab m bc 



(l-m a -m b + m ab )(m a - m ab ){m b - m ab ) 
1 

(l-m b -m c + m bc )(m b - m bc )(m c - m bc ) 
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3.3 The behaviour of Jc(Xm + (1 — X)y) when y G C \C and 

A^O 

In practice the choice of the hyperparameters m and a is ours and for a given model 
J, it is traditional to take m = (rrij, j G J) to be the vector of J- marginal counts 
in a Active contingency table with cell counts all equal and equal to jjr. In any case, 
as long as all Active cell counts are positive, m belongs to the open set C and the 
behaviour of I(m, a) is given by Theorem 13.21 

When studying the Bayes factor, we will have to consider the case where the data 
belongs to the boundary C \C = dC of C, that is to a face of C. To do so, we will 
need to describe the behaviour of Jc(z) as z approaches the boundary of C along a 
straight line. This is done in the following theorem. Without loss of generality, we 
assume that m = so that J c (Xm + (1 — X)y)=J c ((l — X)y). 

Theorem 3.3 Let C be a polytope C E with dimE = n and such that is in the 
interior of C. Let y G dC, let F be the face of C containing y in its relative interior 
and let k be the dimension of F. Then when A — > 

\im \ n ~ k J G ((l - \)y) = D, 

A— s-0 

where D is a positive constant. 
Proof: From ( 120]) we have 

Jc(a - A)y) f d9 

n\ Jc° (1-(1-X)(9,y)) n + 1 ' 1 ' 

In order to study the behaviour of this last integral when e — > 0, we are going to build 
a parametrization of C° which gives a special role to F, the face of C° dual to the 
face F of C containing y in its interior. 

Let E the set of extreme points of C and I C £ the set of extreme points of F. To 
F we associate the dual face of C° defined by 

F = {9eC^\ (0,/) = lV/G2}. (26) 

It is a classical result (see [3]) that F has dimension n — k — 1. Let us now observe 
that we have an equivalent representation of F in ( 1261) as 

F = {6eC° | (0,y) = l}. (27) 

Indeed, since y is in the relative interior of F we write 

f€X 

where A/ > and J^fei-^f = 1- Here A/ > is important in the argument to follow. 
Clearly F C {9 G C° ; (6,y) = l}. Conversely if (9,y) = 1 then £/ e x A/(l - (9, /)) = 
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0. If furthermore 9 G C° we have 1 — (9, f) > and therefore 1 — {9, f) = which 
shows F D {9 eC° ; (9, y) = 1} and proves (127]) . 

Next, for e > small, we consider the following approximation F e of F 

F e = {9eC~°- (9,y) = l-e}. (28) 

which is a n — 1 dimensional convex subset of C° and we want to prove that 

vol n _!F e ~ ce k 

for some positive constant c. Using f )27|) and (126]) . we can rewrite f )28|) as 

F e = {#GC^; ^A / (l-(^/))=6}. (29) 
/ex 

To show vol n _iF e ~ ce k we parametrize F e as follows: let 9 ^ x = <f{9) be the affine 
map from E* to R 1 defined by 



x f = \ f (l-(9,f)), fel (30) 

ilent to {9, f) 

of the simplex 



which is equivalent to (9, f) = 1 — The set 5 e = ^(-^e) is therefore the intersection 



{x inR x ; x f > V/ G X, ^ x/ = e} (31) 

/ex 

and of the convex set y?(C°) which is contained in the affine manifold ip(E*) C R x . If 
.t G iSe then its preimage by (p is the set 

<p-\x) = {0 € ET ; (fl,/) = l-?V/6l} 

A / 

which is an affine subspace of £7* parallel to the linear space 

H = {9eE* ; (0,/) = OV/GZ} (32) 

which has dimension n — A; — 1 since F has dimension A;. As a result we can write F e 
as the following union of disjoint sets 

F e = U xeSc (i P ' 1 (x)nC°) (33) 

which is saying that F t can be parametrized by (x, z) where x G S e , a convex set 
of dimension k, and where 2; G tp~ l (x) fl C°, a convex set of dimension n — k — 1. 
The bijection 9 1— >• (x, 2) is the restriction to F e of the affine map (p and therefore its 
Jacobian K such that d9 = Kdxdz is a constant. 

vol n _iF £ = / d9 = K I { [ __dz\dx 

Jf € Js e \Jip- 1 (x)nc° J 
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If we fix ex in the simplex (13"Tj) . then the behavior of J„-i( ea; o) n c^ dz is easy to describe 
since lim e ip~ l (ex°) (1 C° = F in the sense of polytopes, which implies 

lim f dz = limvoL^i^-^ex ) n^) = voL^^F). 

Let us now observe that is an extreme point of <p(C°). If not there exist x = ip{9) 
and x' = <f (6') with 9 and 9' G C° such that x + x' = 0, that is, for all / G X 

1 - \ f (0, /> + 1 - X f (9', f) = 2 - X f [(9, f) + (9', /)] = 0. 

Since < A/ < 1, this in turn implies A/ = 1 and /) + (9', f) = 2. Since (9, f) and 
(#', /) are < 1 this implies Xf — x'f — for all / G X, a contradiction. Now we use the 
fact that C° is a polytope and so is ip(C°) which has dimension k. For e small enough 
(say < e < e ) the intersection S £ of the simplex given in (I3T]) with ^(C ) coincides 
with the intersection of the simplex with the support cone of ip{C°) at its vertex 0. 
Since a cone is invariant by dilations we can claim that there exists a number c\ > 
such that for < e < e we have 

vol fc (5 £ ) = Cl e k . 

Finally 

vol n _!X; ~ ci K vol n _ fc _!(F)e fc . (34) 

The parametrization of 9 in (|25|) is therefore (x, z, e) where (x, z) is as given in (|33|) 
and the range of e is such that, for that range, F e describes all of C°. We note that 
the bounded function vol n _iF e = /(e) is zero if e is big enough since then F e becomes 
empty and, of course, vol n _iF = vol n _i_F. Let b be such that /(e) = when e > b. 
When e varies from to +oo, F t generates all of C°. Then, following fl34|) . equation 
fl25|) becomes 

d9 f°° vo\ n _ 1 F e de 



c^(l-(l-X)(9,y)) n + 1 Jo (1 - (1 - A)(l -e))™ +1 

f_{e}de 

o (1- (1- A)(l-e))«+ 1 
Using /(e) ~ c e k we will now show that 

Ss^r (i - (i -w- W* =cB(k + k) ' (35) 



and this will conclude the proof. To derive (135]) . we first show that for < a < b 

ra e k de 
10 (\ +e -\ e y 

(\+e-\e) n + l 



(1) X n - k J a (x+ ;y x l )n+1 B{k + l,n-k) 

(2) hm^oA"-^ 
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Statement (1) is shown by the change of variable e = Xt and the theorem of dominated 
convergence. Indeed, for < A < Ao < 1, we have 

e k de r a / x t k . r a / x t k 

Jo 



A 



n—k 



o (A + e-Ae)« +1 



1 + 1 - Xt)^ 1 



< 



(l + t-A t)« +1 



-B{k + 1, n — k) when A — > 0. Since this is true for any A > 0, 



which tends to ( - 1 _ Ao - )fc+ 
statement (1) follows. 

Statement (2) is obvious since J a b n^zf^m+i < I a ^tt is finite. Next, fix 5 > 0. 

There exists a < b such that — c\ < 5 if < e < a. Writing this as — Se k < 
/(e) — ce k < 5e k , integrating and using (1) yields 

i r f {<)<!< 



lim sup 



B(k + l,n — k) Jo (l-(l-A)(l-e))"+ 1 
Since / is bounded (2) implies that 

n—k 



< 5. 



lim sup A 



(l_(l_A)(l-e))»+i A ^ P J a (i_(i_ A )(l-e))-+i 



Thus for all 5 > we have 



lim sup 

A->0 



f(e)de 



B(k + l,n-k) Jo (l-(l-A)(l-e))" +1 



— c 



<6, 



which implies ( 135|) . 



□ 



4 The limiting behaviour of the Bayes factor 

Let us recall that, under the uniform distribution on the class of hierarchical models, 
the Bayes factor between two models J\ and J2 is equal to 

h(^^,a + N)I 2 (m 2 ,a) 
where U = t Jx = (t(j), j e Ji), i = 1, 2. 



4.1 The case where the data is in the interior C of C 

The data is of course given in the form of a contingency table with cell counts n = 
(n(z), j 6 I). In this subsection, we consider the case where the data, which appears 
under the form ti in models Jj, belongs to Cj, i — 1,2 so that 7j(-^,iV), % = 1,2 
are finite and positive. In this case, as a — > 0, from Theorem 13.21 we know that, as 
a -> 0, 



Bi 2 ~ Oi 

Since the numbers Jc irrii), i 



\Ji\-\M 



h(%,N)J C2 (m 2 ) 
I 2 (%,N)J Cl (rni) ' 



(36) 



1, 2 are finite and positive, we have the following 



corollary of Theorem 13.21 
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Corollary 4.1 When the data belong to the open polytope Cj, i = 1,2, /j/ie Bayes 
factor Bi t 2 is such that, when a — > 0, 

T/iis implies in particular that, when the data is in both Ci, i — 1,2 £/ie Bayes factor 
always favours the sparser model. 

The proof follows immediately from f )36|) . Moreover, when a — > and | J2I < |Ji|, 
5i f 2 tends to 0. This result has been well-known, at least numerically, for the class 
of decomposable models and in that case, it can be proved by expressing the Bayes 
factor as in (4.8) of [16] and using the fact that T(a) ~ a -1 as a — > (see Example 3 
of Section 3 and Section [5721) . It has also been observed to hold numerically, most of 
the time, for hierarchical models. Computations illustrating the fact that the Bayes 
factor tends to favour the sparser models in the class of all hierarchical models can 
be found in [16] . p. 3456. We have just shown that it actually always holds when 
the data is in C\ and in C 2 - We will see in the next subsection that things are more 
delicate when the data belongs to the boundary of at least one of C\ or C 2 - 

4.2 The case where the data belongs to a face of Ci, i = 1,2 

In this case, when a — > 0, converges to the boundary point -j| of Cj along the 

segment 

, , OtTfli -\- t{ Ct , Ct . tj inn\ 
SlOC) = = 771; +1 . 37 

V ; a + N a + N 1 V a + N J N V ; 

We need to study the limiting behaviour of B 12 when a — » 0. To do so, we will use 
Theorem 13.31 to obtain the following result. 

Theorem 4.1 Suppose that | G C\C belongs to the relative interior of a face F of 
dimension k. Then 

lima (\j\-k) I{ ^±L +N) (38) 

exists and is positive. 

From Theorems 13.21 and 14. 11 we immediately derive the following which is the object 
of this paper. 

Corollary 4.2 Consider two hierarchical models Ji, i = 1,2 of dimension |J;|. As- 
sume that the data belongs to the relative interior of a face Fi of Ci of dimension 
ki. Then the asymptotic behaviour of the Bayes factor B\ 2 when a — > is given by 

B lj2 ~ Da kl ~ k2 

where D is a finite positive constant. The Bayes factor favours the model which 
contains the data in the relative interior of the face of Ci of smallest dimension. 
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The proof is immediate. According to Theorems 13.21 and 14. lj, we have 

I(m 2 , a )/(^Ka + AQ ^ a |j 1 |_| JaU(fcl _|j 1 |)_ (fca _|j a |) _ nkl - k2 



'cr 



a 



which proves our result. 



Remark 4.1 We note that, if% £ C{, i — 1,2, since Ci is the face ofCi of dimension 
J i} then ki = \Ji\ and Corollary \4-S\ yields Corollary \4-l\ For the same reason, 
Corollary 4-2 also deals with the cases where £ Cj for only i = 1 or i = 2. 

Proof: (of THEOREM 14.11) For simplicity of notation, we will write n = \J\ and 
y = t/N where A" > is fixed. Similarly, to simplify the expression of ( 1371) above, we 
define 



A 



£(0,1) 



a + N 

which implies a = 
that if y belongs to the relative interior of F and if m £ C then 



x and a + N = Our problem is then equivalent to showing 



limA"- fe /(Am + (l-A)y,-^-) 

A^O 1 — A 

exists and is positive. The idea of the proof is to consider the difference 
D(X) = J c (Xm + (1 - X)y) - ( JL-)»J(Am + (1 - \)y, 



so that 



N 



A 



l I(Xm 



N 



1-A' 



J c (\m + (1 - X)y) - D{\) 



(39) 



Then, since from Theorem 13. 3^ lim A ^ X a ~ k Jc(Xm + (1 — X)y) exists and is positive, 
if we show that lim^o X n ~ k D(X) exists and is positive, Theorem 14. II will be proved. 
We proceed to do so now. 

After the change of variable 9 £ R n ^ -^9 £ R n in /(Am + (1 - X)y, ^£), D(X) 
can be written 



',Xm+(l-X)y)-h c (e) 



R\ J \ 



D(X) 



where L(.) is defined in (ITTj) . 

Consider the cone A(f k ) = {9 ; h c {9) 



L((l - X)9/N)^ 



de 



D k {X) 



<Am+(l-X)y-f k ) 



Mfk) 



= (0Jk)}- Let 

l-e {eM (L((l- X)9/Ny^ 



de. 



Since D(X) = Y^kes Ac (A) we need only show that for each k, lim^o A n k D k (X) exists. 
Without loss of generality we can assume that f k = (if not, we replace m and y 
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by m — fk and y — /&). We want to prove that lim^o \ n ~ h D (\) exists and is finite. 
To do so, we split the cone A(0) into a union of closed simplicial cones (A(0) s ) se s o 
with disjoint interiors so that A(0) = U se s o A(0) s and we write D Q (X) = J2sgs A s (A) 
where 

A S (A) = / e <e ' Am+(1 ~ A)i/) [l - (L((l - \)9/N)-&] d6. (40) 

JA(0)s 1 J 

We need to prove that lim^o A n ~ k A s (\) exists and is non negative. The simplicial 
cone A(0) s is defined by n independent linear vectors gj in R n as 

A{0) a = {6 = X l9l + ■■■ + \ n g n eR n ; A 1; . . . , A n < 0}. 

Without loss of generality we assume | detfgi, . . . ,g n } \ = 1. For simplicity we write 

m i( A ) = (9j, Am + (1 - X)y). 

We observe that m,(A) > on A(0)° D A(0)°. Thus by the change of variable 
9 = Xigi + • • • + X n g n i — y (Ai, . . . , A n ) (giving dO = dXi . . . dX n ) we obtain 



e 



{ 0,\ m +{l-\)y) d Q 



a(o) s n™=i^(A)' 

Now, in the integral J A(0)s e < e ' Am +( 1 - A )f) (L((l - X)9/N)~^d9 in the right-hand side 
of (140]) . we make the further change of variable 



(Xj, j = 1, . . . ,n) i-»> (uj = e 



(l-X)Xj/N 



n) 



giving dXx...dX n = (izx) n ^"„"" with uj G (0,1). Moreover, since L(9) = 1 + 
E^o,iexe <e,/i> , we have 

1 — A n 
^(-rr^) = ^((logMi)^i + • • • + (loguO^) = 1 + E II ^ 

iV #0 j=l 

where ay = = (gjjj +/»-/,•) > since (^-, / 3 - - x) = is a supporting 

hyperplane of C at /j. To simplify notation, write 



h x (u u ...,u n ) = (1 + EI1 



'It 



J 



Recalling that a + N = j^x we then have 



A,(A)= 1 [l-AT(A)] (41) 

n,=i^i(A) 



where 



K(A) = (a + iV)"!]™^) x 
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Note that h\ < 1 and that K(X) can be seen as the expectation of h\(Ui, . . . , U n ) 

where U\, . . . ,U n are independent random variables with density f(uj) = (a+N)rrij(X)uj a+ 1713 1 

on (0,1). 

Now to determine the behaviour of A S (A) when A — > 0, we recall that since A(0) s 
is simplicial, the supporting hyperplane of A(0)® are the hyperplanes defined by 

{xeR n ;( 9j ,x) = 0}, j = l,...,n. (42) 

The data point y either belongs to a face of A(0)® of dimension less than n or it 
belongs to its interior. If it belongs to its interior, then for all j = 1, . . . , n, 

m jW (9j,y) 

and standard reasoning shows that the limit of J^(A) exists and is equal to K — 
E(h\(ui, . . . , u n )) where Uj, j = 1, . . . , n follow independent distributions with den- 
sity fj(uj) = N(gj,y)u 1 j f( ' 9j ' y ' ) 1 and therefore 

lim A ^ A n - fe A s (A)^0. 

If y belongs to a face F° of A(0)®, the dimension k° of F° is greater than or equal 
to k so that n — k° < n — k. Therefore F° is contained in the intersection of n — k° 
hyperplanes of the type (142!) . Without loss of generality we can assume that these 
supporting hyperplanes of A(0)° have been numbered so that the first n — k° are those 
containing y, that is, 

{xeR n ;(g v x) = 0} 

for j = l,...,n — k°. As a consequence we have that limA-^o m j{X) = for j = 
1, . . . , n — k° and limA-^o iTij(X) = (gj, y) > if j = n — k° + 1, . . . , n. Thus the 
limiting distribution of Uj when A — > is the Dirac mass at if j < n — k° and is the 
distribution with density N(gj,y)uf ( ' 9: " y ' ) 1 on (0, 1) if n — k° < j < n. It is easy to 
show that 



K = hmK(X) (43) 



exists and is 



o n f 1 f 1 n 

N II (9j,v) J o ■■■] h (0,...,0,u n _ k o +u ...,u n ) n 



uf^-'duj. 



j=n-k°+l u u j=n-fe°+l 

Recall that for 1 < j < n — k° we have rrij(\) = X(gj, m). We get from (|4T!) that 



- k °AJX) = 1 



fimA"-* ^s(X) = — Wi — -(1 - JO. 



Since Hq < 1, this limit exists and is nonnegative and so does 

limA-s.o A n ~ fc A s (A). Moreover, since I(m,a) is always positive, we also have that 
lim Q ^ a n ~ fc /( " ? ^J v t , a + N) always exists and is nonnegative. 
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We now need to prove that the latter is actually positive. To do so, it is sufficient 
to prove that K in (T4"3l) is strictly less than 1 for the K(X) corresponding to at least 
one of the simplicial cones A(f) s . To do so, let us first remark that if y coincides with 
a vertex / of C, then \im\^ mj(\) — for all j — 1, . . . , n so that, as A tends to 0, all 

densities (a + A^m^A)^ 04 ^"^' 1 1 l( 0i i)dv,j tend to the Dirac mass at and K(X) 
tends to 1. So, for such an A(f) s , K = 1. Clearly there exists an f io such that y does 
not coincide with f io and also such that k° = k (otherwise, F would not contain y in 
its relative interior). In such a case, the number n — k of faces of A(fi )® containing 
y is strictly less than n and for j = 1, . . . , n — k, {gj, fi ) = and therefore 

i+EiK*>i+ fi «r>i 

j^0j=l j=n-k+l 

and h\(ui, . . . , u n ) < 1. Since the densities of Uj, j — n — k + 1, . . . , n are proper Beta 
densities when A — > 0, the limit K is strictly less than 1 and we have now proved that 
lim Q ^ ® n ~ k Fy 2 ^j±, a + N) is strictly positive. □ 



4.3 The results of Steck and Jaakola [20] as a particular case 



In [20] Steck and Jaakola study the behaviour of the Bayes factor for two Bayesian 
network models differing by one edge only, when a — > 0. They show it is equivalent to 
the problem of comparing two Bayesian network models with three variables indexed 
by {a,b,c}. The first model has directed edges (6, a), (6, c) and (a, c). The second 
model has directed edges (b, a) and (b, c). These two Bayesian network models are 
Markov equivalent to the two hierarchical (in fact graphical) models J\ and J<i with, 
respectively, generating sets V>\ = {abc} and T> 2 = {ab,bc}. Moreover on these two 
models, the prior in |20j is equivalent to ours. We must then be able to compare their 
result given in Proposition 1 of [20] and our result given in Corollary 14.21 To give 
their results Steck and Jaakola [20] introduce the quantity 

d EDF = E 5 H*)) - E W^)) - E ^KJ + E 5 H^)) (44) 

where 5(.) is an indicator function which is such that S(x) = if x = and S(x) = 1 
otherwise. They state that the Bayes factor B\ 2 behaves as follows 



lim Q ^ ^i, 



if d EDF > 

hOO if dEDF < 



This result coincides with our Corollaries 14.11 and 14.21 for three variable models. In 
fact, we are going to show the following. 

Proposition 4.1 Consider the two decomposable graphical models on three variables, 
Ji and J 2 , as defined above. If the data belongs to faces of dimension ki and k 2 of, 
respectively, C\ and C 2 , then we have 

dEDF = ki-k 2 . 
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Proof: The Bayes factor is equal to 

I( mi ,a)I(^^,a + N) 

where the form of the normalizing constants I(m, a) for decomposable models is well- 
known (see for example equation (4.8) of [16J). When a — > 0, from Theorem 13.21 we 
know that 



J(m 2 , a) 
I (mi, a) 



Expressed in terms of cell counts for the full table, for the b-, ah- and he- marginal 
tables, we have 

H 2 ^, a + N) _ Y[ tex r(am(Q + n(i)) Y\ lb ex b T(am(i b ) + n(i b )) 
I(^^,a + N) n ue i ai r(«m(u)+n(u))n« tag i 4c r(am(i k )+n( lfc )) 

If for some D = ®,ab,bc,b, the marginal cell count n(ijj) is different from 0, when 
a — > 0, r(am(i D ) +n(i D )) — > r(n(iij)) which is finite. If n(io) = 0, then T(am(ir,) + 
ft (*d)) ~ om(ij)) • ^ f°H° ws from (J4"5j) that, when a — )■ 0, -81,2 ~ a q where 

? = [W-£(i-*M0))] 

-w - e a - «(« - E (i - + E a - 

i£X a i, iba^bc ib&Xj, 

Let Cj, z = 1, 2 be the interior of the convex hull corresponding to model Jj. Consider 
model Ji first. It is immediate to see that, following the notation of ( I47p and ( I48p in 
Section 5 below 

n(OOO) = £oa 

n(i) = g i>Cl , i e X 

and according to Theorem 15.11 ra(000) = and n(i) = are the equations of the 
facets of the polytope C\. Therefore the dimension of the space minus the number 
of distinct facets the data belongs to, is equal to the dimension of the face of C\ 
containing the data, that is, 

\h\ " E(! " 8{n(i))) = £*(n(i)) = h. (46) 

Similarly, for model J2, according to Theorem 15. 1\ the equations of the facets of C2 
are given by 

n(iab) = 0, i a b e ?ab and n(i bc ) = 0, i bc G l bc . 

The facets containing the data are therefore those defined by n(i ab ) = or n(i bc ) = 0. 
This does not mean, however, that 

I j 2 | - (1 - E *(*(**))) - E (1 - *(»(»'*))) 
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represents the dimension of the face containing the data. Indeed, if for some i® G Zj,, 
we have n(i®) = 0, this means that n(i a b) — also whenever % = i® and also n(ib c ) = 
whenever % = Then clearly, one of the equations n(i a b) = or n(i& c ) = is 
redundant and we subtract 1 — 5(n(i®)) for the count of facets defining the position 
of the data. It is clear then that 

|J 2 | - £ (1 - 8{n{i ab )) ~ £ (1 - K<i bc )) + £ (1 - 8{n{i b )) = h, 

which, together with proves the proposition. □ 

In fact Proposition 14.11 can be extended to the following general result. Let Cj 
and Si the set of cliques and separators of the decomposable model J i; i — 1,2. We 
define the effective degrees of freedom to be the following sum cIedf '■ 

d E DF = £ £ S(n(i c )) - £ £ S(n(i s )) 
CeCi i c eic SeSi i s ei s 

-(£ £ ^W^))- £ £ «y(n(i ff ))). 

Cec 2 icG^c ■S , e<s 2 ise^s 

Proposition 4.2 Consider two arbitrary decomposable graphical models J\ and J2 
such that the data belongs to faces of dimension k± and k 2 of C\ and C2 respectively. 
Then, the following relation holds: 

dEDF — ki — k 2 . 

The proof of this proposition follows parallel lines to the proof given above for the 
two particular models given in [20J . We therefore have a quick and easy way to know 
the behaviour of the Bayes factor between two decomposable models. 

5 Facets of C for some hierarchical models 

We now turn our attention to the identification of the facets of C. Knowing the 
facets of C is crucial since faces are intersection of facets. Facets of C have been 
much studied by geometers and in Section 5.3, we will recall some known results 
on these facets when the model is binary and governed by a cycle of order n > 3. 
But before doing so, we give two new results on facets of polytopes associated to 
our models. In Theorem I5.1[ we identify a category of facets which is common to 
all discrete hierarchical models. In Corollary 15. H we show that for decomposable 
graphical models, the only facets of C are given by the category of facets given in 
Theorem 15.11 

5.1 Facets common to all hierarchical models 

Let T> the set of subsets of V defining the hierarchical model. Let A be the family of 
maximal elements of T>. For the subclass of graphical models Markov with respect to 
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a graph G, A is the set of cliques of G. This set is traditionally denoted C but in this 
particular subsection, to avoid confusion between a clique C G C and the polytope 
C, we use the notation A G A. 

For each D G T> and each jo G J such that S'(j) C D define the affine forms 

g , D (m) = 1+ E (47) 
j;S(j)cD 

gjoA™) = E (-l)l 5 ^H^o)l m . (48) 

j;S(j)cD, j <j 

Of course, for jo G J the form (?j ,D is not only affine, but is also linear. In this 
subsection, we will use g^A only for A G A but as we shall see in Subsection I5.2[ g^ s 
when S is a minimal separator play an important role also even though S A. In 
the next theorem, we consider the following affine hyperplanes of R J 

H(j, A) = {meR J ; g jA (m) = 0}, j G J U {0}, AGi 

and we prove that 

F(j,A) = H(j^)n^ (49) 

is a facet of the convex set C with extreme points fa = e j- Recall that for T C V 
the index set It means YiveT^v 

Theorem 5.1 Let A £ A be the set of maximal elements of T> defining a general 
hierarchical model. Let jo G JU {0} such that S(jo) C A and let i G I. T/ien gj ,A(fi) 
can only take values or 1. More precisely, the following holds: 

1- 9j ,A{fi) = 1 if and only if j <i and S(i) C\A — S(J ); 

2. there are exactly |/| — \Iv\c\ vectors fi's such that gj ,A(fi) = 0; 

3. the set F(j , A) as defined in UM is a facet of the polytope C. 

Before giving the proof of this theorem, let us illustrate its results. We consider 
a simple decomposable model and list the various faces and the /j's that belong to 
them. 

Example: the A 3 graph. Consider • — • — • and assume that we are in the binary 
case I a = It = I c = {0, 1}. For each F(j, A), we are going to list the /j's that belong 
to it. In this example, I is identified with the power set of V — {a, b, c} and J is 
identified with the set of nonempty complete subsets D of A 3 , namely a, b, c, ab, be. 
In a five-dimensional space with basis e a , e&, e c , e a b, e^c the eight vectors fr are 

0, fa Ca; fb G&j fc C c , 
fab 6 a ~\- €,}} ~\~ &abi fac C a -|- C c , fbc Cfe ~\~ Cc &ba fabc C a -\~ Gf, -\~ 6 C -\~ e a \, -\- 6ft c . 
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Since we have two cliques of size 2 the number of facets Fd,c is 2 2 + 2 2 = 8. They are 
described as follows (we adopt the following short notation : A is the set of T's 
contained in V such that /r £ Fd^a)- 

F lab = {«> b , ab, be, ac, abc}, F* ab = {0, b, c, ab, be, abc}, 

F b,ab = {0> a > C > ab > aC > ab °}> F ab,ab = {$> «) & > C > &C > flC }> 

F0 6c = {6, c, ab, be, ac, abc}, F* bc = {0, a, b, ab, be, abc}, 
F b*bc = a i c i be, ac, abc}, F bcbc = {0, a, b, c, ab, ac}. 

Proof: of Theorem 15. II The proof is long and we will only give it for j G J. We 
skip the case j = since it is entirely analogous. We have 

9*AM = E (-i) |5 °' )H50o)l (e,,Ee/> 



3Q<3 
S(j)CA 



= (— 1) I-^Cj) I — 1 ^Cjo) I _ 

S(j)CA 

From ([5]) the set of j G J such that jo < j <i is non empty if and only if jo<i. Thus 
9j ,A(fi) — if jo < i is false. Suppose now that j < i. Thus S(jo) C A fl 5(z). If 
jS'(z) PI A = S(jo), then the only j satisfying the conditions of the sum above is j = jo 
and clearly gj 0; A(fi) — 1- If Hi ^ S(jo), then any j such that j < j <i and 
S(j) C jS'(z) n^4 can be written j = (is(j ), is(j)\S(j )-> 0)- This implies that the number 
Ylr-so<i<* (~ l)\ s V'\~\ s v°)\ can be computed by the principle of inclusion exclusion and 

S(j)CA 

is equal to zero. We have just proved part 1. of the theorem, that is 

„ ( f \-S 1 if Jo and S(i)nA = S(j ) , . 

9hAUi) ~ o otherwise . 



From Part 1., the number of fi contained in H(j , A) is equal to 1 1\ minus the number 
of fi such that gj ,A{fi) — 1- Since S(i) fl A = S(j ) and j <i, such z's are identified 
by iv\A- Clearly their number is equal to |/v\a|- This proves Part 2. 

Let us now prove Part 3. We know from Proposition 12.21 that the /j are the extreme 
points of C. Therefore, since gj ,A(fi) > for all /, and gj ,A(fi) = for some fi, 
then H(jo, A) is a supporting hyperplane of C. 

The more delicate part of the theorem is to show that F(j , A) is a facet of C. 
This is equivalent to saying that if j G J and A £ A are such that S(j ) C A then 
H(j , A) contains enough points fi which affinely generate it. Since j G J is not zero, 
then fo = is in H(j , A) which is therefore a linear space. To prove that F(j , A) is 
a facet of C, we want to prove that it linearly generates H(j , A). This is equivalent 
to proving the following statement. 

Statement S: If h G H(j , A) is orthogonal to all elements fi G H(j , A) then 
h = 0. 



30 



We write h = J2jej hj e j- We prove that hj = for all j G J in three steps. 

Step 1. We prove that if hj ^ then j <j. Let Af = {j G J ; hj ^ 0}. Let ji be 
a minimal element of TV (we mean that hj 1 ^ and that hj = for all j < ji with 
j 7^ ji). Therefore 

(fh> h ) = J2 h i = h h °- 

Since h is orthogonal to all fi G if (jo, A) we get that / 5l is not in if (jo, A) and, as 
we showed earlier in this proof, this implies that jo <ji- Now let j such that hj ^ 0. 
There exists necessarily a minimal element j\ of A/" such that ji<j. Therefore jo<ji<j 
and Step 1 is proved. 

Step 2. We prove that if jo<j and S(j) C A we have fyj = 0. Let cp(j) = Y,j <j'<ij hj>. 
If j 7^ Jo we have the following equalities 

= h f = £ ^i' - (/i> ^) - °- 

jo<j'<j j'<j 

Indeed, (1) is a consequence of Step 1, (2) is by definition of fj. For (3), we see 
that since j ^ j , S(j) fl A ^ S(j ) and therefore by (1501) ). fj G if (jo, A). Since /i is 
orthogonal to any element of H(jo, A), (3) follows. However if j = jo then (p(jo) = hj . 
The inclusion exclusion principle applied to (p(J) yields, for j < j and S(j) C A, 

hj = x; (-i) |5(j)h|5(/ V(j") = (-i) |5(i)| - |s(io)| /i io . (si) 

We now use the hypothesis h G if (jo, A) that is 

= (g, o , A ,h)= X (-1)1^1-1506)1^ = ^ £ ! 

j'o<i, sO)ca io<i, S0')cA 

As a consequence /i J0 = and (15T|) gives Step 2. 

Step 3. We prove that if jo <j and S(j) ^ A we have /ij = 0. Once we prove this, 
Statement S will be shown and this will complete the proof of Theorem 15. II We prove 
Step 3 by induction on the size k of the set S(j) \ A. 

For k = 0, it is Step 2. To understand the principle of the proof it is wise to give this 
proof first for k — 1. Although throughout the paper the symbol %' < i was used only 
when %' G J it makes sense even if i' and % are in I and we can write 

(fh h) = h i' 

i'<li 

with the convention 

h v = for i' El\J. (52) 

We fix now % G I such that jo < % and that S(i) = A U {v} where v G V \ A. For 
Si C A \ S (jo) consider the unique such that j < i(5'i) < i and S(i(Si)) = 
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S(jo) U Si U {v }. Define now 

jo<i'<i(Si) ScSi 

In this equality, (1) follows from Step 1 and (|52|) while (2) follows from two remarks. 
The first remark is that if jo < i' < i(S\) then i' is entirely determined by its support 
S(i') since i' < «. This support has two possible forms: either S(i') = S(jo) U S or 
S(i') = S(jo) U S U {f }, with 5 C Si. The second remark is that if i' has a support of 
the form S(i') = S(jo) U S then /ij/ = 0. This follows from Step 2 if i e J and from 
fl52|) if i ^ J. From fl50l) . since jo^Si) we have (p v (Si) = (fi^), h) — if and only if 
S(i(Si)) fl A 7^ S(jo), that is if and only if Si 7^ 0. Moreover, <p„(0) = hj s ^ o)u{v} . The 
inclusion exclusion principle applied to (p v (S) therefore implies that 

h- = (-D^h- 

for all 5" C A\ S (jo)- We now apply this last equality to S — A \ S(jo) itself. Because 
of the maximality of A e A the set A U {t> } is not in T> and therefore hj Cu{v} = from 
our convention. As a consequence hj S{ . )uW = and also hj = for all j <j<i. 

This settles the case where S(i) = A U {u} where v £ V \ A. We now make the 
following induction hypothesis on k: if % G J is such that A C S(i) and such that 
S(i) \ A has k elements then hj = for all j G J such that jo < j < We assume that 
this induction hypothesis is true up to k — 1. We denote 

S(i)=Au{vi,...,v k }. 

For Si C A \ S(jo) consider i(Si) such that j < z(Si) < i and defined by S(i(Si)) = 
S(jo) U Si U {v 1, . . . , f fc}. We now fix a subset i? of {t>i, . . . , Vk}. Define 

<Pr(Si) = H h 



l S(j )USUR- 
SCSl 



We have 



(fi(Si),h}= J2 h i'= J2 <Pr(Si) 

joOi'oi(Si) Rc{vi,...,v k } 



Now the induction hypothesis implies that <Pr(Si) = if \R\ < k. We get that 
{fiis&h) = <fi{ Vl ,...,v k }(Si). Since j Q <i(S 1 ) we have (fi{ Vl ,..., Vk y(S 1 ) = (fi( Sl ),h) = 
if and only S(i(Si)) fl A 7^ S(jo), that is if and only if Si 7^ 0. Similarly to the 
case k = 1 we have ¥>{'ui,...,'u fc }(0) = hj s(jo)u{v v } . The inclusion exclusion principle 
therefore implies that 

h- = (-l) lsl h- 

" / JS{j )USu{v 1 ,...,v k } V lo 3sUo)U{vi,-,v k } 

for all S C A \ S(j ). Again hj Au{vi v } = since C U {t> , 1, . . . , v^) is not in X> and 
this leads to hj = if j < j < The induction is extended and Statement S is proved 
as well as Theorem 15. 1[ □ 
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5.2 Facets of C when G is decomposable 



When the graph G is decomposable, the normalizing constant I (to, a) is the nor- 
malizing constant of the hyper Dirichlet as defined in [6]. In the theorem below, we 
restate, in our present notation, the expression of J (to, a) as given in [IB], Formula 
(4.8) and directly derive the form of Jc( m ) f° r decomposable models. A corollary 
giving the facets of C when the model is decomposable follows immediately from the 
theorem. 



Theorem 5.2 Let (V,S) be a decomposable graph, letT> be the family of the complete 
subsets of V, let C be the family of its cliques, let S be the family of its minimal 
separators and let v(S) be the multiplicity of the minimal separator S. Then for m in 
the interior C of the convex hull of the fi 's we have 



/(to, a) 



L(9)- a d6 



Ucec r(ago,c(m)) U{jeJ;S(j)cC} ^{^9j,c{ m )) 
r(a)IW \ r (®goA m ))U{je.J-,sU)cS} r (®9jA m )y U 



(.53) 



and 



lim a' J ' J(m, a) 



Jc(m) 



n 



ses 



90,s( m ) U{j€J;S(j)cS} 9j,s( m ) 



u(S) 



Ucec 9o,c(m) U{ j& j;S(j)cc} 9j,c( m ) 



(54) 



Corollary 5.1 In the case of a hierarchical model associated to a decomposable graph, 
all the facets of C are of the type F(j ,C) described in Theorem \5.1\ with j £ J, with 
C in the set C of cliques and S(j ) C C. 

Proof: We know from Theorem 15 . 1 1 1 hat the affine forms in the denominator of Jc(m) 
in ( 1541) define facets of C. From Theorem 13.11 we know that they are the only ones. □ 
In fact we conjecture, as mentioned in the introduction, that if a model is such that 



the only facets of C are of the type given in Theorem 15.11 then it is a decomposable 
graphical model. 



Example. If V 



and if I = {0, 1, 2} x {0, 1} x {0, 1} we have 



90,bc 


[m) 


= 1 - m 00 i - TOoio + TOon 


<7001,6c 


[m) 


= m oi - tooh 




[m) 


= raoio - rn u 


<?011,&c 


[m) 


= m ii 


90,ab 


[m) 


= 1 - m 100 - m 20 o - mow + rn 110 + m 2W 


9l00,ab 


[m) 


= m W o - TOno 
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<?200,ab 


7 77 I 

lib) 


— ' ;i 200 '"'210 


9010,ab 


m) 


= ^oio - rn no - m 2 i 


9lW,ab 


[m) 


= m no 


9210,ab 


[m) 


= m 2l0 


9o,b 


[m) 


= l - m io 


90W,b 


[m) 


= ^010 



In this case I(m, a) is a quotient: the numerator is the product of 10 gamma functions 
and the denominator is r(a)r(a(l — m io))r(am io)- As a consequence Jcijn) is 



#o,&c(ra)#Qoi,b c (m)#oio,&c(ra)#oii,6c(m)#o^^ 

5.3 Facets of C when the model is binary and the model is 
governed by a cycle 

For the sake of completion and for the convenience of the reader, we recall some 
known results giving the facets of the polytope C when the model is hierarchical, 
binary and governed by a cycle G of order n > 3. The reader is referred to [5] 
and pL3j and some references within for an explicit description of these facets. In this 
subsection, we will simply translate the equation of the facets given in these papers in 
our own coordinates. The results are given in the following theorem. The coordinates 
of m G R J will be denoted m v if they are indexed by a vertex v G V and by m e if 
they are indexed by an edge e G E. 

Theorem 5.3 Let G = (V, E) be a cycle of order n > 3. Assume the hierarchical 
model is binary and governed by G, that is T> = {v G V, e G E}. Then the polytope 
C is defined by the following equations and the facets are defined by the corresponding 
equalities: 

I. for any edge (a,b) G E, 



go,b( m )9oio,b(m) 



m ab > 

m a - m ab > 

m b - m ab > 

l - m a - m b + m ab > , 



(55) 
(56) 
(57) 
(58) 
(59) 



2. for any subset F C E with odd cardinality \F 



{m a + m b - 2m ab ) - ( $^ 




F 



l 



(60) 




2 
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The total number of facets for the polytope C of the model governed by the cycle of 
order n is F n = E keN , k od d,k<n [\ J • 

We see that the facets given by the first four equations are those described in Theorem 
15 . 1 1 corresponding to the cliques {(a, b) G E} while the others are specific to models 
governed by a cycle. We illustrate this theorem in the case of the cycles of order 3,4 
and 5. We will not repeat the facets (!55|) - (l58|) common to all hierarchical models. We 
will give the facets of type (!60|) only. 

For n = 3, let V = {a, b, c} and E = {(a,b), (b, c), (c, a)}, the four facets of type (I60l) 
are 

1 - m a - m b - m c + m ab + m bc + m ac > 

m ab + m c - m bc - m ac > (61) 

and the other two facets obtained from (ISTl) by permutations of the edges of G. 

For n = 4, let V — {a, b, c, d} and E = {(a, b), (6, c), (c, d), (d, a)}, the eight facets of 
type fl60|) are 



1 - m a - m b - m c - m d + m ab + m bc + m cd + m da > 

m c + m d + m ah - m bc - m cd - m da > (62) 

and the other three facets obtained from (|62|) by permutations of the edges of G. 

For n = 5, let V — {a,b,c,d,e} and E = {(a,b),(b,c),(c,d),(d,e),(e,a)}, sixteen 
facets of type (1601) are 

m ab + m c + m d + m e - m bc - m cd - m de - m da > (63) 

1 - m a - m b + m ea + m ab + m bc + m d - m cd - m ed > (64) 

1 -m d + m ab + m cd + m de - m bc - m ae > (65) 

2 - m a - m b - m c - m d - m e + m ab + m bc + m cd + m de + m ea > 

and the other three facets obtained from each of (!63l . (l64T) and (|65l) by permutations 
of the edges of G. 

6 Conclusion 

The main contribution of this paper is the description of the behaviour of the Bayes 
factor as a — > 0. We have shown that, in this study, the important concept is the 
dimension of the face to which the data belongs rather than the dimension of the 
model. We have identified the role of the open convex polytope C and the function 
Jc(-) It is not surprising that C, the convex hull of the support of the generating 
measure of the multinomial for the hierarchical model, plays an important role. The 
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multinomial for a loglinear hierarchical model is a natural exponential family and the 
role of C which is the domain of the mean is well-known. The set C is also of prime 
importance in the study of the existence of the maximum likelihood estimate of the 
parameter (see for example Eriksson et al. [9] or Geiger et al. [H] or Rinaldo [18]). 
However, the role of the characteristic function Jc(-) of C has only been uncovered 
now in the study of the Bayes factor and we can add Jc to the toolkit of exponential 
families. We note that all the limit theorems in Section 3 are valid for any natural 
exponential family such that the convex hull of the support of its generating measure 
is a bounded convex polytope but are not be immediately applicable to a family of 
distributions such as the Poisson where C is not bounded. This is the topic of further 
work. 

A secondary contribution of this paper is our results on the identification of the 
facets of a polytope. We have two new results for polychotomous models (i.e. not 
necessarily binary): the first giving a particular category of facets common to all 
hierarchical models, the second giving the complete set of facets for decomposable 
models. 

We have also extended the results of [20J to the case of any two decomposable 
models, thus allowing the practitioner to predict the behaviour of the Bayes factor 
without using the concept of face or facets of a polytope. 
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A SAMPLE DOCUMENT* 

By First Author*'^ , Second Author §,,, H and Third Author 1 ^! 

Some University*^ and Another University^ 

The abstract should summarize the contents of the paper. It 
should be clear, descriptive, self-explanatory and not longer than 200 
words. It should also be suitable for publication in abstracting ser- 
vices. Please avoid using math formulas as much as possible. 

This is a sample input file. Comparing it with the output it gen- 
erates can show you how to produce a simple document of your own. 

1. Ordinary text. The ends of words and sentences are marked by 
spaces. It doesn't matter how many spaces you type; one is as good as 100. 
The end of a line counts as a space. 

One or more blank lines denote the end of a paragraph. 

Since any number of consecutive spaces are treated like a single one, the 
formatting of the input file makes no difference to T^X, but it makes a 
difference to you. When you use PTgX, making your input file as easy to 
read as possible will be a great help as you write your document and when 
you change it. This sample file shows how you can add comments to your 
own input file. 

Because printing is different from typewriting, there are a number of 
things that you have to do differently when preparing an input file than 
if you were just typing the document directly. Quotation marks like "this" 
have to be handled specially, as do quotes within quotes: " 'this' is what I 
just wrote, not 'that'". 

Dashes come in three sizes: an intra-word dash, a medium dash for number 
ranges like 1-2, and a punctuation dash — like this. 

A sentence-ending space should be larger than the space between words 
within a sentence. You sometimes have to type special commands in con- 
junction with punctuation characters to get this right, as in the following 
sentence. Gnats, gnus, etc. all begin with G. You should check the spaces 
after periods when reading your output to make sure you haven't forgotten 



'Footnote to the title with the 'thankstext' command. 

^Some comment 

* First supporter of the project 

§ Second supporter of the project 

AMS 2000 subject classifications: Primary 60K35, 60K35; secondary 60K35 
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any special cases. Generating an ellipsis . . . with the right spacing around 
the periods requires a special command. 

T^X interprets some common characters as commands, so you must type 
special commands to generate them. These characters include the following: 
& % # { and }. 

In printing, text is emphasized by using an italic type style. 

A long segment of text can also be emphasized in this way. Text within 
such a segment given additional emphasis with Roman type. Italic type loses 
its ability to emphasize and become simply distracting when used excessively. 

It is sometimes necessary to prevent T[tX from breaking a line where it 
might otherwise do so. This may be at a space, as between the "Mr." and 
"Jones" in "Mr. Jones", or within a word — especially when the word is a 
symbol like itemnum that makes little sense when hyphenated across lines. 

TfjX is good at typesetting mathematical formulas like x — 3y = 7 or 
a\ > x 2n /y 2n > x 1 . Remember that a letter like a; is a formula when it 
denotes a mathematical symbol, and should be treated as one. 

2. Notes. Footnotes 1 pose no problem 2 . 

3. Displayed text. Text is displayed by indenting it from the left mar- 
gin. Quotations are commonly displayed. There are short quotations 

This is a short a quotation. It consists of a single paragraph of text. There is 
no paragraph indentation. 

and longer ones. 

This is a longer quotation. It consists of two paragraphs of text. The beginning of 
each paragraph is indicated by an extra indentation. 

This is the second paragraph of the quotation. It is just as dull as the first paragraph. 

Another frequently-displayed structure is a list. The following is an example 
of an itemized list, four levels deep. 

• This is the first item of an itemized list. Each item in the list is marked 
with a "tick" . The document style determines what kind of tick mark 
is used. 

• This is the second item of the list. It contains another list nested inside 
it. The three inner lists are an itemized list. 

— This is the first item of an enumerated list that is nested within 
the itemized list. 



x This is an example of a footnote. 
2 And another one 
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— This is the second item of the inner list. M^X allows you to nest 
lists deeper than you really should. 

This is the rest of the second item of the outer list. It is no more 
interesting than any other part of the item. 
• This is the third item of the list. 

The following is an example of an enumerated list, four levels deep. 

1. This is the first item of an enumerated list. Each item in the list is 
marked with a "tick". The document style determines what kind of 
tick mark is used. 

2. This is the second item of the list. It contains another list nested inside 
it. The three inner lists are an enumerated list. 

(a) This is the first item of an enumerated list that is nested within 
the enumerated list. 

(b) This is the second item of the inner list. ETgX allows you to nest 
lists deeper than you really should. 

This is the rest of the second item of the outer list. It is no more 
interesting than any other part of the item. 

3. This is the third item of the list. 

The following is an example of a description list. 

Cow Highly intelligent animal that can produce milk out of grass. 

Horse Less intelligent animal renowned for its legs. 

Human being Not so intelligent animal that thinks that it can think. 

You can even display poetry. 

There is an environment for verse 
Whose features some poets will curse. 
For instead of making 
Them do all line breaking, 

It allows them to put too many words on a line when they'd 
rather be forced to be terse. 

Mathematical formulas may also be displayed. A displayed formula is 
one-line long; multiline formulas require special formatting instructions. 

1,2 2 

x + y =z i 

Don't start a paragraph with a displayed equation, nor make one a paragraph 
by itself. 

Example of a theorem: 
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Table 1 

The spherical case (I\ = 0, Ii — 0). 



Equil. 

Points 


X 


y 


z 


C 


S 


Li 


-2.485252241 


0.000000000 


0.017100631 


8.230711648 


U 


U 


0.000000000 


0.000000000 


3.068883732 


0.000000000 


s 


La 


0.009869059 


0.000000000 


4.756386544 


-0.000057922 


u 


U 


0.210589855 


0.000000000 


-0.007021459 


9.440510897 


u 


U 


0.455926604 


0.000000000 


-0.212446624 


7.586126667 


V 


u 


0.667031314 


0.000000000 


0.529879957 


3.497660052 


V 


L 7 


2.164386674 


0.000000000 


-0.169308438 


6.866562449 


u 


U 


0.560414471 


0.421735658 


-0.093667445 


9.241525367 


u 


Lg 


0.560414471 


-0.421735658 


-0.093667445 


9.241525367 


u 


L10 


1.472523232 


1.393484549 


-0.083801333 


6.733436505 


u 


L u 


1.472523232 


-1.393484549 


-0.083801333 


6.733436505 


u 



Theorem 3.1. All conjectures are interesting, but some conjectures are 
more interesting than others. 

Proof. Obvious. □ 

4. Tables and figures. Cross reference to labelled table: As you can 
see in Table 1 on page 4 and also in Table 2 on page 5. 

A major point of difference lies in the value of the specific production rate 
7r for large values of the specific growth rate \i. Already in the early publi- 
cations [1-3] it appeared that high glucose concentrations in the production 
phase are well correlated with a low penicillin yield (the 'glucose effect'). It 
has been confirmed recently [1-4] that high glucose concentrations inhibit 
the synthesis of the enzymes of the penicillin pathway, but not the actual 
penicillin biosynthesis. In other words, glucose represses (and not inhibits) 
the penicillin biosynthesis. 

These findings do not contradict the results of [1] and of [4] which were 
obtained for continuous culture fermentations. Because for high values of 
the specific growth rate \x it is most likely (as shall be discussed below) 
that maintenance metabolism occurs, it can be shown that in steady state 
continuous culture conditions, and with \x described by a Monod kinetics 

(4.1) C S = K M ^ 



1 - \ij\i x 



Pirt &c Rhigelato determined tt for \i between 0.023 and 0.086 h . They also 
reported a value fi x ~ 0.095 h. , so that for their experiments fx/fj, x is i n the 
range of 0.24 to 0.9. Substituting Km in Eq. (4.1) by the value Km = 1 g/L 
as used by [1], one finds with the above equation 0.3 < C s < 9 g/L. This 
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Table 2 

Parameter sets used by Bajpai & Reufl 



parameter 


Sef 1 


Set 2 






0.092 


0.11 


K, 


[g/g DM] 


0.15 


0.006 


tip 


[g/g DM h] 


0.005 


0.004 


K p 


fe/L] 


0.0002 


0.0001 


Kt 


fe/L] 


0.1 


0.1 




[g DM/g] 


0.45 


0.47 




[g/g] 


0.9 


1.2 


kh 


[h- 1 ] 


0.04 


0.01 


m s 


[g/g DM h] 


0.014 


0.029 



Fig 1 . Pathway of the penicillin G biosynthesis. 



agrees well with the work of [4], who reported that penicillin biosynthesis 
repression only occurs at glucose concentrations from C s = 10 g/L on. The 
conclusion is that the glucose concentrations in the experiments of Pirt & 
Rhigelato probably were too low for glucose repression to be detected. The 
experimental data published by Ryu & Hospodka are not detailed sufficiently 
to permit a similar analysis. 

Bajpai & Reufi decided to disregard the differences between time con- 
stants for the two regulation mechanisms (glucose repression or inhibition) 
because of the relatively very long fermentation times, and therefore pro- 
posed a Haldane expression for ir. 

It is interesting that simulations with the [4] model for the initial condi- 
tions given by these authors indicate that, when the remaining substrate is 
fed at a constant rate, a considerable and unrealistic amount of penicillin 
is produced when the glucose concentration is still very high [2-4] Simula- 
tions with the Bajpai &: ReuB model correctly predict almost no penicillin 
production in similar conditions. 

Sample of cross-reference to figure. Figure 1 shows that is not easy to get 
something on paper. 

5. Headings. 
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5.1. Subsection. Carr-Goldstein based their model on balancing methods 
and biochemical knowledge. The original model (1980) contained an equa- 
tion for the oxygen dynamics which has been omitted in a second paper 
(1981). This simplified model shall be discussed here. 

5.1.1. Subsubsection. Carr-Goldstein based their model on balancing meth- 
ods and biochemical knowledge. The original model (1980) contained an 
equation for the oxygen dynamics which has been omitted in a second pa- 
per (1981). This simplified model shall be discussed here. 

6. Equations and the like. Two equations: 
(6.1) Cs = K M -^- 

and 



(6.2) G = ° pt „ rcf 100 



^ref 



Two equation arrays: 



1 Q 

(6.3) — = -aX + s F F 

(6.4) f ~ " X 

dP 

(6.5) — = irX-k h P 

(6.6) d 4- = F 



dt 



and, 



(6.7) 

Msubstr l^x T r ^ , ri 

(6.8) [i = /isubstr - Y x/s (l - H(C s ))(m s + tt/Y p/s ) 

(6.9) cr = fi suhstr /Y x/s + H(C s )(m s + -K/Y p/s ) 
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APPENDIX A: APPENDIX SECTION 

We consider a sequence of queueing systems indexed by n. It is assumed 
that each system is composed of J stations, indexed by 1 through J, and K 
customer classes, indexed by 1 through K. Each customer class has a fixed 
route through the network of stations. Customers in class k, k = 1, . . . , K, 
arrive to the system according to a renewal process, independently of the 
arrivals of the other customer classes. These customers move through the 
network, never visiting a station more than once, until they eventually exit 
the system. 

A.l. Appendix subsection. However, different customer classes may 
visit stations in different orders; the system is not necessarily "feed- forward." 
We define the path of class k customers in as the sequence of servers they 
encounter along their way through the network and denote it by 

(A.l) V = {jk,l, jk,2, ■ ■ ■ , 3k,m(k))- 

Sample of cross-reference to the formula A.l in Appendix A. 
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Supplement A: Title of the Supplement A 

(http:/ / www.e-publications.org/ims /support / dowload / imsart-ims.zip ) . Dum 
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usually few lines long 
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